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Abstract 

Nonlinear  regression  with  measurement  error  is  important  for  estimation  from 
microeconomic  data.     One  approach  to  identification  and  estimation  is  a  causal  model, 
where  the  unobserved  true  variable  is  predicted  by  observable  variables.     This 
paper  is  about  estimation  of  such  a  model  using  simulated  moments  and  a  flexible 
disturbance  distribution.     An  estimator  of  the  asymptotic  variance  is  given  for 
parametric  models.     Also,   a  semiparametric  consistency  result  is  given.     The  value  of  the 
estimator  is  demonstrated  in  a  Monte  Carlo  study  and  an  application  to  estimating  Engle 
Curves. 
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1.         Introduction 

Nonlinear  regression  models  with  measurement  error  are  important  but  difficult  to 
estimate.     Measurement  error  is  a  common  problem  in  microeconomic  data,   where  nonlinear 
models  are  often  of  interest.     For  example,   flexible  functional  forms  often  lead  to 
inherently  nonlinear  specifications.     Instrumental  variables  estimators  are  not 
consistent  for  these  models,   as  discussed  in  Amemiya  (1985),   so  that  alternative 
approaches  must  be  adopted.     The  purpose  of  this  paper  is  to  develop  an  approach  that  is 
computationally  feasible  and  also  allows  for  flexibility  in  the  distribution  of 
disturbances.     This  purpose  is  accomplished  by  using  simulated  moments  estimation  with 
flexible  distributions,   an  approach  that  may  be  useful  for  simulated  moments  estimation 
of  other  models. 

The  measurement  error  model  considered  here  has  a  prediction  equation  for  the  true 
regressor  with  a  disturbance  that  is  independent  of  the  predictors.     The  estimator  is 
based  on  the  conditional  expectation  of  the  dependent  variable,   and  the  conditional 
expectation  of  the  product  of  the  dependent  variable  and  mismeasured  regressor.     This 
model  for  measurement  error  in  nonlinear  models  has  previously  been  considered  by 
Hausman,   Ichimura,  Newey,   and  Powell  (1991)  and  Hausman,  Newey,  and  Powell  (1995),  but 
only  for  the  case  of  polynomial  regression  or  approximation,  and  simulated  moments  was 
not  considered.     This  paper  allows  for  general  functional  forms,   significantly  extending 
the  scope  of  the  previous  work. 

Much  of  the  other  work  on  measurement  error  in  nonlinear  models  relies  heavily  on 
the  assumption  that  the  variance  of  the  measurement  error  is  small  relative  to  the  sample 
size.     These  papers  include  Wolter  and  Fuller  (1982)  and  Amemiya  (1985).     In  econometric 
practice  the  measurement  error  often  seems  quite  large  relative  to  the  sample  size,  and 
has  big  effects  on  the  coefficients.     Thus,   it  seems  important  to  consider  approaches 
that  allow  for  relatively  large  measurement  error,  as  does  the  one  here. 


Simulated  moments  estimation  provides  a  computationally  convenient  approach  when 
estimating  equations  involve  integrals,   as  discussed  in  in  Lerman  and  Manski  (1981), 
Pakes  (1986),   McFadden  (1989),   and  Pakes  and  Pollard  (1989).     This  approach  uses  Monte 
Carlo  methods  to  form  an  unbiased  estimators  of  integrals  in  moment  equations.     Allowing 
flexibility  in  disturbance  distributions  is  desirable,   because  consistency  of  the 
estimator  depends  on  correct  specification  of  the  distribution.     Also,   it  is  useful  to 
preserve  the  computational  convenience  of  simulated  moments.     These  goals  are 
accomplished  by  combining  simulated  moment  estimation  with  a  linear  in  parameters 
specification  for  distribution  shape.     The  specification  parameterizes  ratio  of  the  true 
density  to  the  simulated  one.     This  approach  is  similar  to  the  importance  sampling 
technique  from  the  simulation  literature. 

The  parametric  simulated  moments  estimator  we  propose  is  essentially  a  generalized 
method  of  moment  estimator.     Here  the  moments  are  smooth  in  the  parameters,   so  that 
standard  asymptotic  theory  applies.     For  that  reason  we  just  give  large  sample  inference 
procedures  with  an  outline  of  the  asymptotic  theory  for  the  parametric  case.     We  pay  more 
attention  to  conditions  for  consistency  for  the  nonparametric  case,   giving  a  consistency 
result  when  the  number  of  parameters  in  the  distribution  approximation  is  allowed  to 
grow  with  sample  size. 

The  paper  also  includes  Monte  Carlo  and  empirical  applications,  to  evaluate  the 
potential  impact  of  this  approach  for  applied  work.     The  empirical  application  is 
estimation  of  Engel  curves  from  household  expenditure  data.     The  measurement  error 
correction  makes  a  big  difference  in  the  application,  with  a  Gaussian  specification  for 
prediction  error  sufficing  in  most  cases.     Also,  the  estimator  seems  quite  accurate, 
having  small  standard  errors.     The  results  illustrate  the  usefulness  of  using  simulated 
moment  estimation  to  correct  for  measurement  error,  while  allowing  some  flexibility  in 
the  distribution  of  the  prediction  error. 

Section  2  describes  the  errors  in  variables  model  and  some  of  its  implications  for 
conditional  moments.     Section  3  lays  out  the  estimation  method  and  discusses  parametric 


asymptotic  inference  for  the  estimator.      Section  4  gives  a  semiparametric  consistency 
result.     Section  5  presents  results  of  a  small  Monte  Carlo  study.      Section  6  describes  an 
empirical  example  of  Engel  curve  estimation  of  the  relationship  between  income  and 
consumption. 


The  Model 


The  model  considered  here  is 


(2.1)  y  =  f(w  ,5Q)  +  C  E[<|x,v]  =  0, 


* 

w    =    W       +    7},  E[7?|X,V,^]    =    0, 


w     =  ti'x  +  cr  v,  v     independent  of     x. 


where     y     and     ^     are  scalars,     5   ,     w  ,     w,     t),     x,     and     v     are  vectors,     and     Tt       and 

cr       are  conformable  matrices.     Here     w       represents  true  regressors,     7)     measurement 

* 
errors,   and     w     observed  regressors.     The     x     are  observed  and     w  ,   £,  tj,     and     v     may  be 

unobserved.     The  last  equation  is  a  prediction  equation  for  the  true  regressors,  where     x 

are  observed  predictor  variables,     v     is  an  unobserved  prediction  error,     and     cr       is  a 

scaling  matrix,  a  square  root  of  a  variance  matrix.     Some  of  the  true  regressors  can  be 

allowed  to  be  observed,  with     w     equal  to  an  element  of     x,     by  specifying  that 

corresponding  elements  of     tj     and     cr  v     are  identically  zero,   and  the  corresponding 

* 
element  of     tt'x     is     w     =  w.     This  model  was  considered  by  Hausman,   Ichimura,   Newey,   and 

# 
Powell  (1991)   (HINP  henceforth),   for  the  special  case  where  f(w  ,5)     is  a  polynomial  in 

w  .     As  long  as     x     includes  a  constant,  the  location  and  scale  of     v     can  be  normalized, 

e.g.   as     E[v]  =  0     and     Var(v)  =  I     when  the  second  moment  of     v     exists. 

Instrumental  variables  (IV)  estimators  can  be  used  to  estimate  this  model  when 


*  *  * 

f(w  ,5)     is  linear  in     w  .     Substituting     w-tj     for     w       in  the  first  equation  leads  to     x 

being  valid  instruments,   because  the  disturbance  is  linear  in  the  measurement  error     17. 

In  the  nonlinear  case  this  substitution  leads  to  residuals  that  are  nonlinear  in     tj. 

Consequently,     x     will  not  be  valid  instruments,  and  another  approach  has  to  be 

adopted. 

An  approach  to  consistent  estimation  can  be  based  on  integrating  out  the  prediction 

error.     Let     gn(v)     be  the  density  of     v.      Integrating  over  the  prediction  error  leads 

to  three  useful  conditional  expectation  equations: 


(2.2a)  E[y|x)  =  JTdr^x  +  crQv,   S0)g0(v)dv, 


(2.2b)  E[wy|x]  =  Jln'Qx  +  <r  v]T{n'x  +  <r  v,   50)g0(v)dv, 


(2.2c)  E[w|x]  =  tt^x. 


The  first  is  a  regression  of     y     on     x,     analogous  to  the  usual  one,   except  that  the 
unobserved  variable     v     has  been  integrated  out.     The  second  equation  is  a  regression  of 
wy     on     x,     that  is  less  familiar.     The  third  equation  is  a  standard  regression 
equation. 

The  second  equation  is  important  for  identification  of  nonlinear  models.     The 
components  of  this  equation  corresponding  to  unobserved     w       (i.e.   those  not  corresponding 
to  observed  covariates)  provide  information  additional  to  the  first  equation.     As  shown 
in  HINP  for  polynomial  regression,  the  first  equation  does  not  suffice  for 
identification.     Intuitively,  there  are  two  functions  that  need  to  be  identified,  the 
regression  function  and  the  density  of     v,     so  that  two  equations  are  needed  for 
identification.     It  was  shown  in  HINP  that  the  parameters  of  any  polynomial  regression 
equation  are  identified  from  these  two  equations,  and  one  expects  that  identification  of 
the  regression  parameters  will  hold  more  generally. 

It  is  beyond  the  scope  of  this  paper  to  develop  fully  primitive  identification 


conditions  for  this  model,   but  some  things  can  be  said.     First,   the  parameters     n        are 
identified  from  (2.2c)  as  the  coefficients  of  a  least  squares  regression  of     w     on     x, 
so     7r        can  be  treated  as  known  and  identification  of  the  other  pieces  of  the  model 
considered  by  focusing  on  equation  (2.2a)  and  (2.2b).      If     ti'x     has  a  discrete 
distribution  with  a  finite  support  and     m     points  of  positive  probability,  then  (2.2  a  - 
b)  provide     2m  equations.     Assuming  that  none  are  redundant,   i.e.   that  a  "rank  condition" 
holds,   one  could  identify     2m     parameters  from  these  equations,   including     5     and 

parameters  of  a  parametric  family  of  distributions  for     v.     For  example,  HINP  showed 

*  # 

that,   in  the  case  where     w       is  a  scalar  and     f(w  ,5)     is  a  polynomial  of  degree     p,     the 

8     parameters  are  identified  if  the  second  moment  matrix  of     (I.tc'x ^nx        ^'      *s 

nonsingular.     Also,   some  of  the  moments  of     v     are  identified  in  this  case.      If     rc'x     has 

* 
a  continuous  distribution,  then  a  simple  counting  argument  suggests  that     f(w  ,5   )     and 

g   (v)     should  be  identified.     Assuming  that  the  left  hand  sides  of  (2.2a)  and  (2.2b)  are 

distinct  functions,  these  equations  give  two  functional  equations,   and  there  are  two 

functions  to  be  identified.     So,  by  an  analogy  with  the  finite  dimensional  case,   it 

should  be  possible,  under  appropriate  regularity  conditions,  to  identify  both  the 

* 
regression  function  for     w       and  the  density  function  for     v.     Making  this  intuition 

precise  would  be  quite  difficult,  because  of  the  nonlinear,   nonparametric  (i.e. 

functional)  nature  of  these  equations,  but  it  is  an  important  problem  deserving  of  future 

attention. 

Independence  of     x     and     v     is  a  strong  assumption,  but  in  the  general  nonlinear 

model  of  equation  (2.1)  it  is  difficult  to  drop  this  assumption.     Intuitively,   if  some 

moments  of     v     can  depend  on     x,     then  it  is  much  more  difficult  to  separate  the 

regression  function  from  the  distribution. 


3.        Estimation 

To  describe  the  estimator  it  is  helpful  to  embed  the  model  in  a  more  general 
conditional  moment  setup.     Let     z     denote  a  data  observation,     /3     a     q  x  1     vector  of 
parameters,     g     a  density  function  of  a  random  vector     v,     p(z,/3,g)     a     r  x  1     residual 
vector,  and     H(z,£,v)     a     r  x  1     vector  of  functions,  related  as  in 

(3.1)  p(z,P,g)  =    rH(z,/3,v)g(v)dv. 


Suppose  that  there  is  a  set  of  conditioning  variables     x     such  that  for  the  true 
parameter  value     £        and  density     g   , 

(3.2)  E[p(z,(30,g0)|x]  =  0. 


The  nonlinear  errors-in-variables  model  is  a  special  case  of  this  one,   where 


(3.3)  H  (z.P.v)  =  y  -  f(7r'x+crv,S), 


H  (z,/3,v)  =  Uwy  -  [7r'x+o-v]-f(7r'x+crv,5)>, 


H3(z,/3,v)  =  w  -  jt'x,     /3  =  (S'.o-.ji'  )', 


and     L     is  a  selection  matrix  that  picks  out  those  elements  of     w     that  include 
measurement  error. 

The  common  approach  to  using  equation  (3.2)  in  estimation  is  nonlinear  instrumental 
variables.     One  difficulty  with  this  approach  is  that  the  density     g(v)     is  unknown. 
Another  difficulty  is  that  the  residual  is  an  integral  that  may  be  difficult  to  compute. 
Here,  these  difficulties  are  dealt  with  simultaneously,  by  choosing  a  flexible 
parameterization  for  the  density  that  makes  it  easy  to  use  a  simulation  estimator  of  the 
integral.     To  describe  this  approach,   we  begin  with  a  specification  of  the  density 
function. 


For  now,   suppose  that  the  density  is  a  member  of  a  parametric  family,   of  the  form 

(3.4)  g(v,y)  =  PCv.rMv),     P(v,y)  =  £.  jr.p.Cv), 

where     <p(v)     is  some  fixed  density  function.     For  example,   if     <p(v)     were  standard  normal 
and     p.(v)  =  v      ,     then  this  would  be  an  Edgeworth  approximation.     The  function     g(v,?-) 
need  not  be  positive,   but  leads  to  residuals  that  are  linear  in  the  shape  parameters     n 
and  that  can  easily  be  estimated  by  simulation. 

For  a  density  like  that  of  equation  (3.4),  a  simulated  residual  can  be  constructed 
by  drawing  random  variables  from     <p{v)     and  then  evaluating  the  product  of  the  linear 
combination     P(v,y)     and  the     H     functions.     Let     z.     denote  a  single  observation  and 
[v    ,...,v.   ]     denote  a  vector  of  random  variables,  each  having  marginal  density     <p(v). 
For  example,   if     (p(v)     is  a  standard  normal  pdf,  then     [v    ,...,v.   ]     could  be  computer 
generated  Gaussian  random  numbers.     Then  a  simulated  residual  for  the     i         observation 
is 

(3.5)  p\(e)  =  s-1£ j\H(z.,(3,v   )P(v    *),    e  =  (p'.r')'. 

This  is  essentially  an  importance  sampling  estimator  of  the  residual,  where  <p(v)  is 
sampling  density  and  P(v,y)  approximates  g(v)/<p(v).  The  simulated  residual  is  an 
unbiased  estimator  of  the  true  residual,  because 


E[p.(G)|z.]  =  p(z  ,0,g(y)). 


Therefore,  by  the  results  of  McFadden  (1989)  and  Pakes  and  Pollard  (1989),   an 

instrumental  variables  (IV)  estimator  with     p. (6)     as  the  residual  should  be  consistent 

l 

if  the  IV  estimator  with  the  true  residual  is.     An  IV  estimator  can  be  formed  in  a 
familiar  way.     Let     A(x)     denote  a     q  x  r     vector  of  instrumental  variables,  that  may  be 
estimated.      Suppose  that     9     solves 


(3.6)  n  ^.^ACxJp.O)  =  0. 


This  is  a  simulated,   nonlinear  IV  estimator  like  that  of  McFadden  (1989). 

Because  equation  (3.6)  is  linear  in     P{v,y),     it  is  important  to  normalize  the 
density     P(v,3")<p(v)     to  integrate  to  one.     Also,   it  may  be  important  to  impose  a  location 
and  scale  normalization  on  this  density.     There  are  different  ways  to  impose 
normalizations  by  imposing  constraints  on  the  coefficients.     For  example,   if     <p(v)     is 

the  standard  normal  density  and     p.(v),  p~(v),   ...     are  the  Hermite  polynomials  that  are 

2 

orthonormal  with  respect  to  the  standard  normal  density  (i.e.     J"p.(v)  <p(v)dv  =  1     and 

Jp.(v)p  (v)<p(v)dv  =  0     for     j  *  k),     then     y    =1,     j     =  0,     and     y     =  0     will  imply  that 

P(v,y)<p(v)     integrates  to  one,  and  has  zero  mean  and  unit  variance.     It  is  also  possible 

to  impose  such  constraints  using  the  simulated  values,  by  requiring  that     y 

satisfy£.nT  S,(l,v.  ,v2  )P(v.  ,r)  =  0. 
^i=l^s=l       is     is        is 

In  the  nonlinear  errors  in  variables  model,   it  is  convenient  to  work  with  a  two  step 
estimator,   where  the  first  step  consists  of  estimation  of     n     by  least  squares  (LS),   and 
the  second  step  is  an  instrumental  variables  estimator  using  the  first  two  residuals  of 
equation  (3.3).     The  first  order  conditions  for  such  an  estimator  can  be  formulated  as  a 
solution  to  equation  (3.6),   if     A(x)     is  chosen  in  a  particular  fashion.     Let     u     be  the 
LS  estimator  and 

(3.7)  A(x)  =  diag[B(x),x], 


where     B(x)     has  two  columns  and  number  of  rows  equal  to  the  number  of  elements  of 
(5',o\y').     Suppose  that  constraints  are  imposed  on  the     y     coefficients  such  that 
£  _iP(v-  »y)  =  0.     Then  the  solution  to  equation  (3.6),  with     A(x)     specified  as  in 
equation  (3.7),   requires  that     n     be  the  least  squares  estimator,   and  that  the  other 
parameters  solve  the  equation 


n  a 


(3.8)  0  =  £"B(x.)p.(a),     a  =  (5',<r,r')\ 


p. (a)  = 

1 


y: 


Lw.y. 
*■         l    l   ' 


-i£ 


s=l 


f(7r'x.+crv.     ,5) 
l        is 

L(7r'x.+crv.    )f  (n'x.+o-v.  ,5) 
1         is  1         is 


P(vis>3-) 


In  the  empirical  example  the  estimator  minimizes  a  quadratic  form  that  has  this  type  of 

S 
equation  as  its  first  order  condition,   although  the  normalization     V     ,P(v.  ,y)  =  0     is 

*-ls=l      is 

not  imposed.     Specifically,   for     C(x)     equal  to  a  vector  of  instrumental  variables  and     W 
a  positive  definite  matrix,     a     solves 


n  a 


(3.9)  min       [£  "  C(x.)p.(a)]' W[£"  C(x.)p.(a)]. 

a      ^1=1       l     l  ^i=l       l     l 


The  first  order  conditions  to  this  minimization  problem  are  as  given  in  equation  (3.8), 
with 


(3.10)  B(x)  =  [aX.n,C(x.)p.(a)/Saj'W. 

^1=1       l     l 


Standard  large  sample  theory  for  IV  can  be  used  for  asymptotic  inference  procedures. 

If  the  simulated  values     (v    v.)     are  included  with  the  data  to  form  an  augmented 

observation  for  the     i         data  point,  then  the  usual  IV  formulae  can  be  used  to  form  a 

consistent  variance  estimator.     For  example,  suppose  that     (z.,v    v.)     are 

independent  observations  as     i     varies.     Then  under  standard  regularity  conditions  (e.g. 
see  Newey  and  McFadden,   1994),   the  asymptotic  variance  of     v^nte  -  9    )     can  be  estimated 
by 

(3.11)  V  =  G_1nG_1/,     G  =  n"V.n,A(x.)s3.(e)/ae,     a  =  n"V.niA(x.)p.(e)p.(G)'A(x.)'. 

^1=1       1      1  ^1=1       111  1 


If  the  simulation  draws     v.       are  mutually  statistically  independent  as     s     varies  for  a 
given     i,     then  one  could  also  use  the  estimator 


10 


(3.12)  V  =  G_1£2G  *',     Q  =  n  V.^AtxJA.Atx.)' , 

^1=1        1     l        l 

A.  =  S"V  S,H(z.,/3,v.   )H(z.,/3,v.   )'P(v.  ,£)2. 
i  ^s=l       1        is         i        is  is 


Both  of  these  variance  estimators  ignore  estimation  of  the  instruments,   which  is  valid 
under  standard  regularity  conditions.     Because  the  large  sample  theory  for  these 
estimators  is  straightforward,  we  do  not  give  regularity  conditions  here. 


4.        Consistent  Semiparametric  Estimation 


If  the  functional  form  of  the  density     gn(v)     is  left  unspecified,  then  the  model 
becomes  semiparametric.     Models  where  identification  is  achieved  by  conditional  moment 
restrictions  like  those  of  equation  (3.1)  are  nonlinear,  nonparametric  simultaneous 
equations  models.     Newey  and  Powell  (1991)  have  considered  estimation  of  such  models,   and 
their  result  can  be  applied  here.     The  basic  idea  is  to  apply  the  previous  estimator,  but 
with     P(v,y)     chosen  to  be  a  member  of  an  increasing  sequence  of  approximating  families 
and  the  IV  equation  (3.6)  replaced  by  a  nonparametric  conditional  expectation  equation. 

Let     &     be  a  set  of  functions  of     v     that  will  be  assumed  to  include  the  true 
density     gn(v)     and  satisfy  other  regularity  conditions  given  below.     Also,   let 
{Pj(v,^)>  be  a  sequence  of  families  such  that     P Sv,j)<p{v)     can  approximate  any 

function.     Let     &.  =  {P.(v,y)<p(v)}n§',     9  =  (£,g)     be  the  parameter  consisting  of  the 
Euclidean  vector     /3     and  a  density     g,     and     p.(0)     be  the  simulated  residual  of  equation 
(3.5)  with     Pj(v,y)     replacing     P(v,y).     Let     E[p.(G)|x.]     be  a  nonparametric  estimator 
of  the  conditional  expectation  of     p. (8)     given     x.,     such  as  a  series  or  kernel 
estimator.     Then  a  minimum  distance  estimator  of     9     =  (/3   ,g   )     is 
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(4.1)  6  =  argmin0eSx^  Q(9);     Q(6)  =  £.=1E[p\(e)  |  x.]' DE[P.(e)  |x.]/n, 

n 

where     D     is  a  positive  definite  matrix  and     J       can  depend  on  the  data  and  on  sample 

size.     The  objective  function  in  equation  (4.1)  is  a  sample  analog  of 

Q(9)  =  E[E[p(z,e)|x]'DE[p(z,e)|x]], 

where     D     is  the  limit  of     D     and     p(z,9)  =  TH(z,/3,v)g(v)dv.     If     D     is  positive  definite 

and     6       is  identified  from  the  conditional  moment  equation  (3.2)   (i.e.   that  equation  has 

a  unique  solution),     then     Q(8)     will  have  a  unique  minimum  of  zero  at     6   .     The  general 

extremum  estimator  reasoning  (e.g.   Newey  and  McFadden,   1994)  then  suggests  that     0 

should  be  consistent. 

The  estimator  can  be  shown  to  be  consistent  if     fi     and     g     are  restricted  to  compact 

sets,  similarly  to  Gallant  (1987).     The  compact  function  set  assumption  is  a  strong  one, 

but  the  results  of  Newey  and  Powell  (1991)  indicate  its  importance  for  minimum  distance 

estimators  of  the  form  considered  here.     For  a  matrix     A  =  [a. .],     let     II A II   = 

ij 

1/2 
Itrace(A'A)]       ,     and  for  a  function     g(v)     let     llgll     denote  a  function  norm,   to  be 

further  discussed  below. 


Assumption  4.1:     /3     €  B,     which  is  compact,   and     g     e  i*,     a  compact  set  in  a  norm     llgli 


In  the  primitive  regularity  conditions  given  below,      llgll     will  be  a  Sobolev  norm, 

The  following  dominance  condition  will  be  useful  in  showing  uniform  convergence. 

Assumption  4.2:     There  exists     M(z,v)     such  that  for     £,  0  e  S,      IIH(z,J3,v)ll   ^ 
M(z,v),      IIH(z,/3,v)-H(z,/3,v)ll   <  M(z,v)IIJ3-j3ll. 

Moment  conditions  for  the  dominating  function     M(z,v)     will  be  specified  below. 

To  show  uniform  convergence  of  the  objective  function  of  equation  (4.1),   it  is 
useful  to  impose  a  strong  condition  on  function  norm,  that  it  dominates  a  weighted 
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supremum  norm.     Let     V     denote  the  support  of     <p(v),     and 
llgll, ,       =  sup.,|g(y)|w(v),     w(v)  >  0. 

Below  it  will  be  assumed  that     HgHv        is  dominated  by     llgll,     so  that  Assumption  4.1 
implies  that     HgHv        is  bounded  on     §\     The  import  of  this  assumption  is  a  uniform  bound 
on  the  tail  behavior  of     g(v),     imposed  by  the  presence  of  the  weight  function;  the 
faster     u(v)     grows  as     v     moves  outward,     the  faster  the  tails  of     g(v)     must  go  to  zero 
in  order  to  guarantee  that     sup   {|g(v)|w(v)>     is  finite.     Also,  the  nature  of  importance 
sampling  imposes  a  restriction  on  the  tail  thickness  of  the  true  density  relative  to  the 
baseline  density.     For  second  moment  dominance  this  restriction  will  translate  into  a 
restriction  on     w(v)     relative  to     <p{v).     These  considerations  lead  to  the  following 
assumption: 

Assumption  4.3:      HgH,,       -   llgll     for     g6  ?     and  there  exists     e  >  0     such  that 

V  t0) 
EIJ  M(z,v)        [u(v)<p(v)]         w(v)     dv]  <  oo. 

In  order  to  guarantee  that  the  parametric  approximation  suffices  for  consistency,  the 
following  denseness  condition  will  be  imposed. 

Assumption  4.4:     For  any     g  €  *§     and     J     there  exists     P Av,y)<p(v)  e  ^       such  that 
limJ_>ajIIPJ(-,r)*>  -  gll   =  0. 

This  condition  specifies  that     g       can  be  approximated  by  the  family. 

It  is  necessary  to  make  some  assumption  concerning  the  conditional  expectations 
estimator.     The  following  condition  is  lifted  from  Newey  and  Powell  (1991).     Without 
changing  notation  assume  that  the  data  observation     z.     includes  the  simulation  draws 
(v.  ,...,v.   ).     Assume  that  the  data  are  stationary. 
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Assumption  4.5:     For     €  >  0     from  Assumption  4.3,   i)  D  — >  D,     D     is  positive  definite, 

and  if     E[  |tf/(z.)  | 1+e/2]  <  oo     then     [."^(z.)/n  -^  E[(ft(z.)];     ii)  if  E[^/(z)2+e]     is 

finite,     £."   IIEty(z)  IXj]  -  E[i/»(z)  |x.]ll   /n  -^  0;      iii)  either     a)  E[i//(z)|x.]  = 

yn,w..0(z.),     w..  £  0,     V.n,w..  =  1,     (s,t=l,...,n),   and  and  if     E[  |i//(z)  | 1+€/2]  <  «,, 
J=l    iJ      J  iJ  J=1    U 

l"  E[0(z)|z  ]/n  =  0(1);      or     b)  Etyr(z)|z  ]  =  P'.  (£  "  P  .P' .)"£  " ,P.0(z  .). 

Assumption  4.5  can  easily  be  checked  in  some  cases  and  is  quite  general.     For  instance, 

if     z.     is  i.i.d.  then  it  is  easy  to  use  known  results  to  show  that  Assumption  4.5  holds 

for  nearest  neighbor  and  series  estimators.     For  K-nearest-neighbor  estimators  with     K  — > 

oo,     K/n  — >  0,   ii)  follows  by  Lemma  8  of  Robinson  (1987)  and  Proposition  1  of  Stone 

(1977),  while  iii)  a)  holds  by  construction.     For  a  series  estimator  of  the  form  given  in 

iii)  b),  with  P       containing     K     elements  such  that  any  function  with  finite  mean  square 

can  be  approximated  arbitrarily  well  in  mean-square  for  large  enough     K,     ii)  follows 

from  Lemma  A.  10  of  Newey  (1994a)  and  the  arguments  for  Lemma  A.  11  as  long  as     K  — >  oo     and 

K/ne/(€+2)  ->  0. 

Neither  of  these  results  allow  for  data  based     K.     It  should  be  noted  that 
Assumption  4.5  restricts  the  form  of  randomness.     Implicitly  the  form  of  the  weights     w 


st 


in  Assumption  4.5  and  the  approximating  functions     P      are  restricted  to  not  depend  on 
ip.     Thus,  while  they  could  be  chosen  based  on  some  fixed     \p     (e.g.   a  linear  combination 
of     p(z,6)     for  some  preliminary  estimator     6),     they  are  not  allowed  to  vary  with     i// 
(i.e.  with     B     in     p(z,G)).     Assumption  4.5  should  also  be  "plug-compatible"  with  future 
results  on  nonparametric  conditional  expectation  estimators,   such  as  those  for  time 
series. 

The  last  assumption  specifies  that     J       must  go  to  infinity  with  the  sample  size. 

Assumption  4.6:     J     — ^->  oo     as     n  — >  m. 

As  mentioned  earlier,  the  degree  of  approximation     J     can  be  random,   in  a  very  general 
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way.     However,   it  should  be  noted  that  it  is  not  restrictions  on  the  growth  rate  of     J 
that  are  used  to  obtain  consistency,   but  rather  the  restriction  of  the  function  to  a 
compact  set.     Often,   the  compactness  condition  will  require  that  higher  order  derivatives 
be  uniformly  bounded,   a  condition  that  will  have  more  "bite"  for  large  values  of     J, 
imposing  strong  constraints  on  the  coefficients  of  higher  order  terms. 

These  assumptions  and  identification  deliver  the  following  consistency  result: 

Theorem  4.1:     If     E[p(z,G)\x]  =  0     has  a  unique  solution  on     Bjc?     at     6       and  Assumptions 
4.1  -  4.6  are  satisfied,  then     II 0-/3    II  -^  0     and     llg-g    II   -^  0. 

It  should  be  noted  that  the  hypotheses  of  this  theorem  are  not  very  primitive  until  the 
norm     llgll     is  specified.     Once  that  is  specified,   it  may  require  some  work  to  check  the 
the  other  assumptions. 

The  following  set  of  Assumptions  is  sufficient  to  demonstrate  that  the  assumptions 
are  not  vacuous,  and  do  cover  cases  of  some  interest. 

Assumption  4.7:     i)  v     is  one-dimensional;     ii)  There  is  a  compact  interval     V     and  a 
fixed  constant     B     such  that  ^  =  <g(v)  :  g(v)  =  0     for     v  g  V,  sup   |g(v)|    ^  B, 
|g(v)-g(v)|    ^  B|v-v|      for  all     v,   v  e  V};     iii)    llgll   =  sup    |g(v)|;      iv)     The  support  of 
<p(v)     is     V     and     y>(v)     is  continuous  and  bounded  away  from  zero  on     V;     v)     Ptv,^)  = 


I-  Qr -vJ;     vi)     E[supvgVM(z,v)        ]  <  oo. 


Corollary  4.2:     If  Assumptions  3.1,  4.2,  4.5  -  4.7  are  satisfied,     (3     e  B,     satisfied, 
and     £     is  compact  then     11/3-pJI  -^  0     and     Hg-gJI  -^>  0. 


This  result  is  restrictive  in  several  ways.     It  is  easy  to  relax  the  assumption  that     v 
is  one  dimensional,  using  the  results  of  Elbadawi,  Gallant,  and  Souza  (1983).     It  is  more 
difficult  to  allow  for  noncompact  support  for     v,     although  this  extension  is  possible 
using  the  results  of  Gallant  and  Nychka  (1987).     Unfortunately,  their  result  allows  for 
quite  thick  tails,  with     w(v)  =  C(l+v'v)       in  Assumption  4.3.     This  tail  behavior  does 
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not  allow  Assumption  4.3  to  be  satisfied  when     <p(v)     is  the  standard  normal  density.      Of 
course,  there  are  fast  computational  methods  for  generating  data  from  densities 
proportional  to     (1+v'v)     ,     so  that  one  could  easily  use  such  thick-tailed  baseline 
densities.     Also,   it  should  be  possible  to  develop  intermediate  conditions  that  allow  for 
more  general  simulators. 


5.        A  Sampling  Experiment 

A  small  Monte  Carlo  study  is  useful  in  a  rough  check  of  whether  the  estimator  can 

give  satisfactory  results  in  practice.     Consider  the  model 

* 
*  — w 

(5.1)  y  =  5{  +  52w     +  53e        +  e,     8[  =  82  =  I,     <     is     N(0,1), 

* 
w  =  w     +i),     7)     is     N(0,1), 


w     =  tt    +  tt  x  +  v,     tt    =  n     =  1,     x     and     v     are     N(0,.5). 


The  regression  equation  for  this  model  is  one  that  is  useful  in  estimating  the 
relationship  between  consumption  and  income.     This  specification  will  be  further 

discussed  in  Section  6,   where  it  is  used  in  the  empirical  example.     The  parameter  values 

* 
were  set  so  that  the  r-squared  for  the  prediction  equation  for     w       was     1/2,     and  so  the 

signal  to  noise  ratio  was     1.     The  number  of  observations  was  set  to  100.     The  number  of 

observations  was  chosen  to  be  small  relative  to  typical  sample  sizes  in  economics,  to 

* 
make  computation  easier.     The  r-squared  for  the  regression  of     w       on     x     was  set  higher 

than  typical  in  order  to  offset  the  small  sample  size,  so  that  the  estimator  might  be 

informative. 

Table  One  reports  the  results  from  100  replications.     Results  for  three  different 
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estimators  of     5  ,      5   ,     and     5       are  reported.     The  first  estimator  is  an  ordinary  least 
squares  (OLS)  regression  of     y     on  the  right-hand  side  variables     (l,w,e      )     that  are 
measured  with  error.     The  second  estimator  is  an  instrumental  variables  estimator  (IV) 
with  the  same  right-hand  side  but  with  instruments     R.  =  (l,h.,h.  ,h. )' ,     where     h.  =  7i 
+  7r„x..     The  third  estimator  is  a  simulated  moment  estimator  (SM)  from  equation  (3.9), 
with     C(x.)  =  I®R.     and     W  =  I®(£.     R.R'. )     ,     where  I     is  a  two  dimensional  identity 
matrix.     This  estimator  is  a  system  two-stage  least  squares  estimator  where  the 
instrumental  variables  are     R..     Also,     P[v,-y)     was  a  Hermite  polynomial  of  the 
third-order,  where     y    =1,     y_  =  y„  =  0,     and     y       was  estimated.     There  were  two 
simulations  per  observation. 

In  one  replication  out  of  the  100  the  estimator  did  not  converge  to  a  stationary 
point.     This  replication  was  excluded  from  the  results,  that  are  reported  in  Table  One. 
The  estimator  shows  promise.     The  standard  errors  of  the  IV  and  simulated  moment 
estimators  are  much  larger  than  the  OLS  estimator,  but  the  biases  are  substantially 
smaller.     As  previously  noted  the  IV  estimator  is  inconsistent,   although  in  this  example 
it  leads  to  bias  reduction.     It  is  interesting  to  note  that  the  standard  error  of  the  SM 
estimator  is  smaller  than  that  of  the  IV  estimator.     Thus,   in  this  example  the  valid  SM 
correction  for  measurement  error  leads  to  both  smaller  bias  and  variance  than  the 
inconsistent  IV  correction. 


6.        An  Application  to  Engel  Curve  Estimation 

The  application  presented  here  is  estimation  of  Engel  curves,  a  subject  that  has 
long  been  of  interest  in  econometrics.     Measurement  error  has  recently  been  shown  in 
Hausman,  Newey,   and  Powell  (1995)  to  be  important  in  the  estimation  of  nonlinear  Engel 
curves.     This  section  adds  to  that  work  by  estimating  a  nonlinear,  nonpolynomial  Engel 
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curve  for  the  model  of  equation  (2.1),   a  specification  that  was  not  estimated  in 
Hausman,   Newey,   and  Powell   (1995).     Also,  the  results  here  take  account  of  measurement 
error  in  the  denominator  of  the  share  equation. 

The  functional  form  considered  here  is  that  preferred  by  Leser  (1963), 

(5.1)  S.  =  5,   +  5„ln(I.)  +  5_(1/I.)  +  £., 


* 
where     S.     is  the  share  of  expenditure  on  a  commodity  and     I.     is  the  true  total 

expenditure.     As  suggested  by  the  Hausman,   Newey,   and  Powell  (1993)  tests  of  the  Gorman 

(1981)  rank  restriction,  a  rank  two  specification  such  as  this  may  be  a  good 

specification,  once  the  measurement  error  has  been  accounted  for. 

* 
In  addition,   a  specification  is  considered  that  accounts  for  the  presence  of     I. 

in  the  denominator  of  the  left-hand  side  of  this  equation.     This  "denominator  problem" 

* 

results  from  the  fact  that     S.  =  Y./I.,     where     Y.     is  the  expenditure  on  the  commodity. 

ill  l 

* 

Thus,   if     I.      is  measured  with  error,   another  nonlinear  measurement  error  problem 

results  from  using  the  measured  shares.     This  problem  can  be  dealt  with  by  bringing     I. 

out  of  the  denominator,   giving 

(5.2)  Y.  =  5,1.   +  6_I.ln(I.)  +  50  +  I.e.. 

l  1  l  2  l        l  3         11 

If     e.     satisfies  the  usual  restriction     E[e.|I.]  =  0,     then  equations  (5.1)  and  (5.2) 
are  equivalent  statistical  specifications,   in  that  running  least  squares  on  either 

equation  should  give  a  consistent  estimator.     Covariates  will  also  be  allowed  in  this 

* 
specification  by  allowing  additional  variables     I.x  .     to  enter  linearly  in  this 

equation,  corresponding  to  inclusion  of     x        as  additional  regressors  in  the  share 

equation  (5.1). 

The  measurement  error  will  be  assumed  to  be  multiplicative,   i.e.   for     I.     equal  to 

the  observed  total  expenditure, 
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*  *  *  * 

(5.3)  ln(I.)  =  w.   =  n'x.  +  v.,      ln(I.)  =  ln(I.)  +  T).  =  w.  =  w.    +  tj.. 

i  1  0   1         1  l  11111 


In  the  empirical  work  the  predictor  variables     x.     will  be  a  constant,   age  and  age 
squared  for  household  head  and  spouse,   and  dummies  for  educational  attainment,   spouse 
employment,  home  ownership,   industry,   occupation,  region,  and  black  or  white,  a  total  of 
19  variables,   including  the  constant.     With  this  specification  for  the  measurement  and 
prediction  equations,     f(w  ,<5)  =  8    +  5  w     +  5  exp(-w  ),     as  in  the  Monte  Carlo  example. 

The  measurement  error  in  the  left-hand  side  denominator  can  be  accounted  for  as  in 

*  *  *     * 

equation  (5.2),   leading  to  a  specification  with     f(w  ,5)  =  5  exp(w  )  +  5  exp(w  )w     +  S  . 

It  is  interesting  to  note  that  even  if  the  share  equation  is  linear  in     ln(I.),     so 

that     5     =  0,     this  equation  is  nonlinear,  so  that  IV  will  not  be  consistent.     Thus, 

measurement  error  in  the  denominator  of  the  share  suggests  the  need  for  the  estimators 

developed  here. 

The  data  used  in  estimation  are  from  the  1982  Consumer  Expenditure  Survey   (CES). 
The  basic  data  we  use  are  total  expenditure  and  expenditure  on  commodity  groups  from  the 
first  quarter  of  1982.      Results  were  obtained  for  four  commodity  groups,   food,   clothing, 
transportation,   and  recreation.     The  number  of  observations  in  the  data  set  is  1321.     The 
empirical  results  were  reported  as  elasticities,   i.e.   dlnf(x)/dlnx,     as  is  common  in 
econometrics.     To  compare  shapes,   elasticities  were  calculated  at  the  quartiles  of 
observed  expenditure. 

The  results  are  given  in  Tables  Two  through  Five.     Table  Two  gives  some  sample 
statistics,   including  the  quartiles  of  the  income  distribution.     The  other  tables  will 

include  estimated  expenditure  elasticities  at  these  quartiles.     Table  Two  also  gives 

2 
information  on  the  prediction  regression.     The     R       in  this  regression  is     .23,     which  is 

quite  sizable  for  such  a  cross-section  data  set.     The  other  information  is  useful  in 

calculating  the  magnitude  of  the  measurement  error  and  bounding  the  size  of  the  variance 

of  the  prediction  error     v.     In  particular,   the  model  we  have  assumed  implies  that  the 
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standard  error     .45     of  the  residual  is  an  upper  bound  on  the  standard  deviation  of  both 

the  measurement  error  and  the  variance  of  the  prediction  error     v. 

1/2  2 

Also,  given  an  estimator     cr     of     Var(v)       ,     an  estimator  of  the     R       of  the  measurement 

equation,  that  determines  the  magnitude  of  the  measurement  error  bias  in  a  linear  model, 
is     Var(7r'x  +  v)/Var(w)  =  [Var(ir'x)  +  0-2]/Var(w)  =  [(.25)2  +  ^2]/(.51)2  tt  .24  +  (3.8)<?2 

Tables  Three  to  Five  give  results  for  each  commodity  for  three  different 
specifications  of  the  share  equation  and  four  different  estimators.     Table  Three  gives 
results  for  the  share  equation,  where  measurement  error  in  the  denominator  of  the 
left-hand  side  is  ignored.     This  specification  is  the  same  as  in  the  Monte  Carlo  study. 
Table  Four  changes  the  specification  to  account  for  the  left-hand  side  denominator  by 
multiplying  through  the  original  equation  by  total  expenditure,   as  described  above. 
Table  Five  adds  covariates     x      to  the  share  equation  to  allow  for  demographic  and 
regional  price  effects.     There  are  six  covariates;   own  and  spouse  age,   family  size,   and 

three  regional  dummy  variables.     The  equation  estimated  is  analogous  to  that  of  Table 

* 
Four  in  accounting  for  the  left-hand  side  denominator,  with     f(w  ,x  ,5)     = 

*     *  * 

5     +  5  exp(w  )w     +  8     +  exp(w  )x  '  5  .     It  should  be  noted  that  this  specification 

restricts  family  size  to  be  absent  from  the  prediction  equation. 

Tables  Three  to  Five  report  results  for  four  different  estimators,   ordinary  least 

squares  (LS),  two  stage  least  squares  (IV)  with  instruments  described  below,  the 

simulated  moment  estimator  with  Gaussian     v     (SMO),   and  the  simulated  moment  estimation 

with  one  Hermite  polynomial  term  (SMI),  of  the  third  order,   included  in  the  moment 

functions.     The  simulated  moment  estimators  are  each  obtained  as  in  equation  (3.9),  with 

p. (a)     as  given  in  equation  (3.8),     10     simulation  draws,   and     W     equal  to  the  inverse  of 

an  estimated  asymptotic  variance  of     £._  C(x.)p.(a)/v/n.     Specifically, 


(5.4) 


W  =  E    ,     Z  =  n  T."  U.U'., 
^i=l    l    l 

U.  =  C(x.)p.(a)  +  [S^niC(x.)p.(a)/57r](ynix.x'.)~1x.(w.-7r'x.), 
i  i     i  J=l       J     J  ^J=l  J    J         li  l 
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3 
where     a     is  an  initial  consistent  estimator.       This  is  an  asymptotic  variance 

minimizing  choice  of     W,     that  accounts  for  the  presence  of     n     in     p.. 

The  standard  errors  for  LS  and  IV  were  calculated  from  heteroskedasticity  consistent 

formulae,   e.g.   as  given  in  White  (1982).     The  standard  errors  for  simulated  moment 

estimators  were  calculated  from  the  GMM  asymptotic  variance  estimator     (H'Z    H)     ,     where 

H  =  aY.n,C(x.)p.(a)/aa. 
^i=l       i     l 

A  selection  process  was  used  to  choose  the  order  of  powers  of  the  predicted  value 
to  include  in  the  instruments.     Starting  at  the  second  order,   the  minimum  needed  to 
have  enough  moments  to  allow  estimation  of  distribution  parameters,  the  order  was 
chosen  by  cross-validation  on  the  food  equation,   Gaussian,   simulated  moment  estimator 
(SMO),  using  the  cross-validation  criteria  for  choice  of  instruments  suggested  in  Newey 
(1994b).     Inclusion  of  higher  order  powers  did  not  result  in  any  decrease  in  the 

cross-validation  criteria.     Consequently,   in  Tables  Three  and  Four  the  instrumental 

-2 
variables  were     (1,   x'n,    (x'rc)   ).     In  Table  Five     exp(x'n:)'X      was  added  to  the 

instruments,   because  of  the  presence  of  the  covariates. 

The  number  of  Hermite  polynomial  terms  to  include  was  chosen  essentially  by  an 

upwards  testing  procedure,  applied  in  the  model  of  Table  Three.     Inclusion  of  a  third 

order  term  was  tried  in  each  case,  as  reported  in  Table  Three.     This  term  allows  for 

asymmetry  in  the  distribution  of     v.      If  it  was  statistically  significant,   a  fourth  order 

term  was  tried.     In  none  of  the  cases  was  this  term  significant,  so  only  results  for  the 

one,  third  order,  Hermite  polynomial  term  are  reported  in  the  tables. 

1/2 

For  each  estimator,   elasticities  at  the  quartiles,  the  estimate  of     o-  =  Var(v)       , 

and  the  estimator  of  the  coefficient  y  of  the  Hermite  polynomial  term,  as  well  as 
standard  errors  (in  parentheses  below  the  estimates)  are  reported.  The  (asymptotic) 
t-statistic  on  the  coefficient  of  inverse  expenditure  (t-stat)  and  the  overidentification 

3 

The  procedure  used  to  obtain  the  initial  consistent  estimators  was  to  begin  with  an 

identity  weighting  matrix,  use  a  few  iterations  to  obtain  "reasonable"  parameter  values, 
choose     W     as  in  equation  (5.4),   and  then  minimize  to  get     a. 
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(minimum  chi-square)  test  (Q)  statistic  for  the  simulated  moment  estimator  are  also 
reported.     The  t-statistic  is  particularly  relevant  in  Table  Three  because  the  2SLS 
estimator  would  be  consistent  if  the  coefficient  on  inverse  expenditure  were  zero. 
The  degrees  of  freedom  of  the  overidentification  test  statistic  are     2     and     1 
respectively  for  SMO  and  SMI,   in  Tables  Three  and  Four,   and     8     and     7     respectively  in 
Table  Five.     The  difference  between  these  statistics  for  SMO  and  SMI  is  a  one-degree  of 
freedom  chi-squared  test  of  the  Hermite  coefficient  being  zero. 

Even  though  the  IV  estimator  is  inconsistent,   it  gives  results  similar  to  the  SM 
estimator  in  a  number  of  cases.     When  the  share  denominator  is  allowed  to  be  measured 
with  error  there  are  larger  differences  between  IV  and  SM.     The  standard  errors  of  SM  are 
smaller  than  those  of  IV,  which  is  consistent  with  the  Monte  Carlo  results  of  Section  4. 
There  are  large  differences  between  the  OLS  and  SM  estimators,   as  is  consistent  with  the 
presence  of  measurement  error.     It  is  interesting  to  note  that  the  elasticities  for 
transportation  go  down  rather  than  up,  unlike  linear  regression  with  measurement  error. 

In  comparing  Tables  3  and  4,   it  is  apparent  that  accounting  for  measurement  error  in 
the  denominator  leads  to  some  changes  in  the  results.     There  is  more  nonlinearity  in  the 
food  equation  in  Table  4  than  in  Table  3.     The  prediction  error  standard  deviation     cr     is 
more  precisely  estimated  in  these  equations.     The  overidentification  test  statistics  are 
larger  in  Table  4.     Surprisingly,  the  estimated  standard  errors  in  Table  4  are  not  much 
larger  than  those  in  Table  3,  although  Table  4  is  a  levels  equation  that  is  sometimes 
thought  to  be  more  heteroskedastic  than  the  share  equation.     There  is  little  evidence  of 
nonnormality.     In  most  cases  SMO  is  quite  similar  to  SMI,  except  for  much  larger  standard 
errors. 

In  summary,   although  allowing  for  nonnormality  does  not  change  the  empirical 
results,   correcting  for  measurement  error  makes  a  big  difference.     In  several  cases  the 
simulated  moments  estimator  is  quite  different  than  the  inconsistent  IV  estimator, 
suggesting  that  the  inconsistency  of  IV  estimator  may  not  be  uniformly  small. 
Furthermore,  the  simulated  moment  estimators  seem  quite  accurate,  having  small  standard 
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errors.     These  results  illustrate  the  usefulness  of  using  simulated  moment  estimation  to 
correct  for  measurement  error,   while  allowing  some  flexibility  in  the  distribution  of  the 
prediction  error  to  asses  the  impact  of  allowing  for  nonnormality. 
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Appendix 

Proof  of  Theorem  4.1:     The  proof  proceeds  by  verifying  the  hypotheses  of  Theorem  5.1  of 
Newey  and  Powell  (1991).      Let  the  norm  for     9  =  (/3,g)     be     II ell   =   11/311   +   llgll.      Note  that     0 
=  "B-xS     is  compact  by     S     and     "S     compact.     For     S     simulations  let     Z     denote  the 

augmented  data  vector,  with     Z  =  (z,v  ,...,v  ).     Also,   let     p(Z,6)  = 

-1     S 
S    Y,  _,H(z,/3,v  )P(v  ,y).     Note  that  for     Pq(v)  =  g0(v)/<p(v),     it  follows  by  Assumptions 

4.2  and  4.3  that  by  the  triangle  inequality 

(A.l)  llp(Z,90)ll   ^  ^sf1IIH(z,P0,vs)ll|p0(vs)|/S  =s  <Isf1M(z,Vs)[w(vs)?)(vs)]"1/S}llg0ll, 


(A.2)  {Etllp(Z,en)ll2+e]}1/(2+e)  ^  C-£  ?  <E[<M(z,v  )[u(v  )<p(v  )]  1>2+e]>1/(2+e)/S 

=  C-{E[Jv[M(z,v)w(v)]"2-eV(v)~1_€dv]»1/(2+6)  <  oo. 


It  follows  similarly  to  equation  A.l  that  that  for     6,  6  e  0, 


(A.3)  iip(z,e)-p(z,e)n  ^  c-<y  s,m(z,v  )[u(v  )<ph  ))  1/s}iie-en, 

^5=1  S  S  S 


so  that  Assumption  5.1  of  Newey  and  Powell  (1991)  follows  by  eq.    (A.2).     Assumptions  5.2 
and  5.3  then  follow  by  Assumptions  4.4  and  4.5.     Furthermore,   by  the  fact  that 
E[p(Z,6)|x]  =  E[p(z,/3,g)  |x]     for  an  unbiased  simulator,  as  noted  in  the  text,  Newey  and 
Powell's  (1991)  Assumption  3.1  holds  by  Assumption  2.1.     The  conclusion  then  follows  by 
the  conclusion  of  Theorem  5.1  of  Newey  and  Powell  (1991).     QED. 

Proof  of  Corollary  4.2:     The  proof  proceeds  by  verifying  the  hypotheses  of  Theorem  4.1. 
Assumption  4.1  follows  by  hypothesis  and  the  Arzela  theorem,  which  gives  compactness  of 
J*     in  the  sup  norm.     Assumption  4.3  follows  with     w(v)  =  1     by     <p{v)     bounded  away  from 
zero  and  Assumption  4.6,   vi).     Assumption  4.4  follows  by  a  Weirstrass  approximation  of 
g(v)/op(v)     by     PjW-  Tne  proof  then  follows  by  the  conclusion  of  Theorem  4.1.     QED. 
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Table    One:   Monte   Carlo    Results 
51  52  63 


Bias  SE  RMSE  Bias  SE  RMSE  Bias  SE  RMSE 


OLS 

-1.  10 

.25 

1  .  13 

.  67 

.  13 

.  67 

.85 

.  12 

.  86 

IV 

-  .30 

3.31 

3  .  32 

.  13 

1.  25 

1  .  26 

.47 

2.01 

2.  07 

SM 

-  .09 

2.27 

2.27 

.  07 

1.05 

1  .  05 

.31 

1.62 

1.  65 

Table  Two:     Some  Sample  Statistics 


25th  50th  75th 

Income  Quartiles  3373  4574  6417 

Sample  standard  error  of  log  of  expenditure  .51 

Standard  error  of  predicted  .25 

Standard  error  of  residual  .45 

R-squared  .23 


Table  Three:      Elasticity  Estimates  for  Share  Equations 

Food 
25th  50th  75th  o-  y  t-stat  Q 

4.52 


47 


3.67  6.36 


-.01  .04  6.08 

(  .05) 


LS    .72 

.66 

.59 

(.02) 

(.02) 

(.03) 

2SLS  .82 

.78 

.74 

(.05) 

(.04) 

(.06) 

SMO   .82 

.78 

.74 

.61 

(.04) 

(.04) 

(.06) 

(.08) 

SMI   .84 

.78 

.71 

.31 

(.20) 

(.09) 

(.36) 

(1.49) 
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Clothing 

25th 

50th 

75th 

o- 

IS 

LS 

1.21 

1.08 

.97 

(.05) 

(.04) 

(.05) 

2SLS 

1.61 

1.42 

1.30 

(.12) 

(.09) 

(.10) 

SMO 

1.63 

1.40 

1.26 

.02 

(.20) 

(.10) 

(.18) 

(.38) 

SMI 

1.62 

1.28 

1.07 

- . 0009 

.15 

(.46) 

(.30) 

(.56) 

( .0018) 

(.11) 

t-stat  Q 

18.66 

2.25 


.56 


6.  11 


.20  1  .  99 


25th 


50th 


75th 


Transportation 


o~ 


t-stat 


LS        1.28 

1.44 

1.50 

11.19 

(.07) 

(.06) 

(.07) 

2SLS      .99 

1.06 

1.  12 

1.00 

(.08) 

(.08) 

(.12) 

SMO     1.02 

1.01 

1.01 

1.71 

.04 

(-07) 

(.06) 

(.06) 

(2.01) 

SMI      1.40 

.98 

.63 

.  10 

.028 

3.10 

(.27) 

(.07) 

(.18) 

(.07) 
Recreation 

(.018) 

25th 

50th 

75th 

cr 

K 

t-stat 

LS        1.40 

1.20 

1.06 

16.59 

(.07) 

(.06) 

(.07) 

2SLS  1.70 

1.31 

1.07 

11.45 

(.15) 

(.12) 

(.15) 

SMO    2.97 

1.33 

.39 

.02 

6.48 

(.63) 

(.12) 

(.48) 

(.34) 

SMI     6.98 

2.32 

1.28 

.71 

.024 

5.58 

(4.40) 

(.36) 

(.12) 

(.18) 

( .  002) 

11.28 


7.54 


09 
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Table  Four:     Elasticity  Estimates  for  Level  Equations 

Food 
25th  50th  75th  cr  y  t-stat  Q 

.31 


3.34 


LS           .68 

.63 

.57 

.04 

.03 

.03 

2SLS      .90 

.80 

.70 

(.08) 

(.05) 

(.05) 

SMO        .  98 

.81 

.63 

.35 

(.08) 

(.04) 

(.05) 

(.02) 

SMI      1.21 

.83 

.46 

.24 

.008 

(.21) 

(.06) 

(.13) 

(.06) 

(.005 

LS       3.14 

1.95 

1.48 

(1.06) 

(.19) 

(.09) 

2SLS     .23 

.93 

1.53 

(.65) 

(.12) 

(.53) 

SMO     1.16 

.94 

.76 

.64 

(.08) 

(.05) 

(.04) 

(.04) 

SMI      1.25 

.97 

.74 

.58 

.004 

(.59) 

(.20) 

(.09) 

(.27) 

(.021) 

12.4  18.16 


5.09  16.42 


Clothing 

25th 

50th 

75th 

<r 

H 

t-stat 

LS        1.40 

1.  11 

.89 

53.95 

(.11) 

(.06) 

(.04) 

2SLS  2.04 

1.50 

1.21 

4.92 

(.26) 

(.12) 

(.13) 

SMO     2.07 

1.36 

.96 

.34 

39.04 

(.21) 

(.08) 

(.07) 

(.02) 

SMI     2.14 

1.37 

.93 

.33 

.001 

3.19 

(.58) 

(.11) 

(.18) 

(.09) 

(.009) 

Transportation 
25th  50th  75th  <r  y  t-stat 

1.68 


2.04 


17.15 


17.13 


38.27  10.63 


.69  10.52 


29 


Recreation. 

25th 

50th 

75th 

0" 

n 

t-stat 

LS        1.74 

1.26 

.95 

42.87 

(.19) 

(.09) 

(.05) 

2SLS  1.84 

1.32 

1.01 

14.85 

(.28) 

(.16) 

(.15) 

SMO     2.35 

1.44 

.95 

.36 

48.62 

(.26) 

(.09) 

(.07) 

(.03) 

SMI     7.85 

1.93 

.33 

.25 

.024 

15.88 

(4.76) 

(.34) 

(.15) 

(.02) 

(.005) 

33.23 


21.83 


Table  Five:     Elasticity  Estimates  for  Level  Equations  with  Covariates 

Food 
25th  50th  75th  <r  7  t-stat  Q 

1.90 


4.40 


LS           .72 

.66 

.58 

(.05) 

(.03) 

(.03) 

2SLS      .97 

.85 

.74 

(.09) 

(.06) 

(.06) 

SMO     1.00 

.86 

.73 

.33 

(.08) 

(.05) 

(.05) 

(.02) 

SMI      1.25 

.90 

.58 

.21 

.009 

(.29) 

(.07) 

(.15) 

(.07) 

(.007) 

7.14  43.51 


2.47  41.59 


Clothing 

r 

25th 

50th 

75th 

cr 

K 

t-stat 

LS        1.35 

1.08 

.88 

32.76 

(.11) 

(.06) 

(.04) 

2SLS  2.01 

1.50 

1.22 

4.22 

(.31) 

(.14) 

(.15) 

SMO     1 .  94 

1.31 

.94 

.33 

29.32 

(.22) 

(.09) 

(.08) 

(.02) 

SMI     2.01 

1.33 

.92 

.31 

.002 

1.59 

(.73) 

(.14) 

(.23) 

(.11) 

(.011) 

43.23 


42.95 


30 


Transportation 


25th 

50th 

75th 

<r 

If 

LS        2.54 

1.83 

1.50 

(.63) 

(.14) 

(.08) 

2SLS      .05 

.70 

1.38 

(.55) 

(.16) 

(.45) 

SMO     1 .  07 

.87 

.69 

.61 

(.09) 

(.06) 

(.04) 

(.04) 

SMI      1.86 

1.  13 

.61 

.40 

.016 

(.87) 

(.26) 

(.07) 

(.08) 

( .009) 

7076    DIO 


t-stat  Q 

1.30 

3.09 

28.01  30.12 

2.61  28.92 


Recreation 

25th 

50th 

75th 

cr 

K 

t-stat 

LS 

1.73 
(.19) 

1.25 
(.09) 

.95 
(.05) 

35.96 

2SLS 

1.78 
(.30) 

1.31 
(.17) 

1.01 
(.16) 

10.69 

SMO 

2.40 

1.41 

.88 

.31 

37.86 

(.32) 

(.11) 

(.08) 

(.03) 

SMI 

8.38 

1.91 

.18 

.21 

.021 

10.91 

5.45 

(.32) 

(.23) 

(.03) 

( . 005) 

54.23 


44.88 
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